Simple plotting



In [1]:

    
import pysal as ps
import pandas as pd
import numpy as np

This notebook will cover simple plotting. So that we can visualize plots within the notebook, we first must "turn on" the notebook plotting capabilities.

Commands in a Jupyter notebook that start with % or %% are known as magics, and are essentially directions to the Jupyter kernel itself. Usually the commands do not execute in Python and are, in fact, not actually part of Python. The command to enable inline plotting in a notebook is %matplotlib inline. Another magic, %matplotlib notebook, provides some additional tools which we will cover.

The standard python plotting library matplotlib, has a special submodule, pyplot, that is used to provide an environment for plotting functions. So, we will import matplotlib.pyplot directly. This is commonly done in plotting code.



In [2]:

    
import matplotlib.pyplot as plt
%matplotlib inline

Before we do any plotting, let's read in some of the data that we have used before, the historical per-capita income data for US States:



In [3]:

    
path = ps.examples.get_path('usjoin.csv')
#remember, this is a csv, so you should use pandas.read_csv to get a dataframe
data = pd.read_csv(path, index_col='STATE_FIPS') 
W = ps.queen_from_shapefile(ps.examples.get_path('us48.shp'), idVariable='STATE_FIPS')



In [4]:

    
data.head()









    Out[4]:






  
    
      
      Name
      1929
      1930
      1931
      1932
      1933
      1934
      1935
      1936
      1937
      ...
      2000
      2001
      2002
      2003
      2004
      2005
      2006
      2007
      2008
      2009
    
    
      STATE_FIPS
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      1
      Alabama
      323
      267
      224
      162
      166
      211
      217
      251
      267
      ...
      23471
      24467
      25161
      26065
      27665
      29097
      30634
      31988
      32819
      32274
    
    
      4
      Arizona
      600
      520
      429
      321
      308
      362
      416
      462
      504
      ...
      25578
      26232
      26469
      27106
      28753
      30671
      32552
      33470
      33445
      32077
    
    
      5
      Arkansas
      310
      228
      215
      157
      157
      187
      207
      247
      256
      ...
      22257
      23532
      23929
      25074
      26465
      27512
      29041
      31070
      31800
      31493
    
    
      6
      California
      991
      887
      749
      580
      546
      603
      660
      771
      795
      ...
      32275
      32750
      32900
      33801
      35663
      37463
      40169
      41943
      42377
      40902
    
    
      8
      Colorado
      634
      578
      471
      354
      353
      368
      444
      542
      532
      ...
      32949
      34228
      33963
      34092
      35543
      37388
      39662
      41165
      41719
      40093
    
  

5 rows × 82 columns

Pandas provides two simple and fast plotting attributes, hist and plot.

hist will plot a histogram of data, and can be called either on the entire dataframe or on individual series/columns:

Pandas histograms

Pandas has a histogram function for any column or table. These are configured to be easy to use, and typically can pass arbitrary options down to the underlying matplotlib.hist function:



In [5]:

    
data['1929'].hist()









    Out[5]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fb3f4405350>



In [6]:

    
data['1929'].hist(color='black', alpha=.4, orientation='horizontal', bins=10)









    Out[6]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fb3f414fc90>

Again, this histogram function is not as detailed as the matplotlib histogram function which we'll show in a second, but one very useful option is the by for the pandas column/table histogram, since it allows you to quickly construct histograms by group.

For instance, let's introduce a dummy variable denoting whether or not a state is in the US south:



In [7]:

    
south_dummy = [[u'Alabama', 1],
       [u'Arizona', 0],
       [u'Arkansas', 1],
       [u'California', 0],
       [u'Colorado', 0],
       [u'Connecticut', 0],
       [u'Delaware', 1],
       [u'District of Columbia', 1],
       [u'Florida', 1],
       [u'Georgia', 1],
       [u'Idaho', 0],
       [u'Illinois', 0],
       [u'Indiana', 0],
       [u'Iowa', 0],
       [u'Kansas', 0],
       [u'Kentucky', 1],
       [u'Louisiana', 1],
       [u'Maine', 0],
       [u'Maryland', 1],
       [u'Massachusetts', 0],
       [u'Michigan', 0],
       [u'Minnesota', 0],
       [u'Mississippi', 1],
       [u'Missouri', 0],
       [u'Montana', 0],
       [u'Nebraska', 0],
       [u'Nevada', 0],
       [u'New Hampshire', 0],
       [u'New Jersey', 0],
       [u'New Mexico', 0],
       [u'New York', 0],
       [u'North Carolina', 1],
       [u'North Dakota', 0],
       [u'Ohio', 0],
       [u'Oklahoma', 1],
       [u'Oregon', 0],
       [u'Pennsylvania', 0],
       [u'Rhode Island', 0],
       [u'South Carolina', 1],
       [u'South Dakota', 0],
       [u'Tennessee', 1],
       [u'Texas', 1],
       [u'Utah', 0],
       [u'Vermont', 0],
       [u'Virginia', 1],
       [u'Washington', 0],
       [u'West Virginia', 1],
       [u'Wisconsin', 0],
       [u'Wyoming', 0]]



In [8]:

    
south_dummy = pd.DataFrame(south_dummy, columns=['NAME', 'SOUTH'])

Now, we can merge this with our existing data using the merge method. This operates like any standard table join:



In [9]:

    
data = data.merge(south_dummy, left_on='Name', right_on='NAME')

Now, we can quickly make plots of the per capita income distribution, split up by the dummy variable:



In [10]:

    
data['1990'].hist(by=data.SOUTH)









    Out[10]:





array([<matplotlib.axes._subplots.AxesSubplot object at 0x7fb3f3fa0310>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x7fb3f3fd9c10>], dtype=object)

Using `pyplot`

Pyplot, a submodule of matplotlib, is the main driver for most plotting code in Python. It has a few basic commands that we will use to make statistical plots.

Most importantly, though, the matplotlib gallery provides a good reference for different commonly-encountered plotting problems.

First, though, we will cover basic line and point plotting.



In [11]:

    
plt.plot([4,2,1,3])









    Out[11]:





[<matplotlib.lines.Line2D at 0x7fb3f3cda110>]

First, note that when plot is passed a single list or array, $(i, y_i)$ is plotted, where $i$ is the position of an element in the list.

When we pass two lists, matplotlib interprets the first list as the x-coordinates and the second as a list of the y-coordinates.



In [12]:

    
plt.plot([4,2,1,3], [0,5,2,1])









    Out[12]:





[<matplotlib.lines.Line2D at 0x7fb3f3c23450>]

There are a few ways to plot many different lines at once. The easiest way to do this is to run multiple plot commands before calling plt.show(). This adds each line generated from plt.plot to the same figure, which is then shown when plt.show() is called.



In [13]:

    
plt.plot([4,2,1,3], [0,5,2,1])
plt.plot([5,2,1,3], [0,4,1,0])
plt.show()

Noting this, you can use many of the various types of customization functions in matplotlib before plt.show() and they will be applied to the current figure:



In [14]:

    
plt.plot([4,2,1,3], [0,5,2,1])
plt.plot([5,2,1,3], [0,4,1,0])
plt.title('Trajectories')
plt.ylabel('$\\theta$', fontsize=20)
plt.xlabel('x')
plt.show()

One very powerful aspect of matplotlib is that it plots each row of an array as a new line as well. To show how this is powerful, let's make a plot of the per capita income of all states over time.

Since we have income data from 1929 to 2009, and typing all of those columns would be tedious, let's do it using Python:



In [15]:

    
columns = [str(year) for year in range(1929, 2010) ]



In [16]:

    
columns









    Out[16]:





['1929',
 '1930',
 '1931',
 '1932',
 '1933',
 '1934',
 '1935',
 '1936',
 '1937',
 '1938',
 '1939',
 '1940',
 '1941',
 '1942',
 '1943',
 '1944',
 '1945',
 '1946',
 '1947',
 '1948',
 '1949',
 '1950',
 '1951',
 '1952',
 '1953',
 '1954',
 '1955',
 '1956',
 '1957',
 '1958',
 '1959',
 '1960',
 '1961',
 '1962',
 '1963',
 '1964',
 '1965',
 '1966',
 '1967',
 '1968',
 '1969',
 '1970',
 '1971',
 '1972',
 '1973',
 '1974',
 '1975',
 '1976',
 '1977',
 '1978',
 '1979',
 '1980',
 '1981',
 '1982',
 '1983',
 '1984',
 '1985',
 '1986',
 '1987',
 '1988',
 '1989',
 '1990',
 '1991',
 '1992',
 '1993',
 '1994',
 '1995',
 '1996',
 '1997',
 '1998',
 '1999',
 '2000',
 '2001',
 '2002',
 '2003',
 '2004',
 '2005',
 '2006',
 '2007',
 '2008',
 '2009']

Now, we can use these to grab each year's data from our dataframe:



In [17]:

    
data[columns]









    Out[17]:






  
    
      
      1929
      1930
      1931
      1932
      1933
      1934
      1935
      1936
      1937
      1938
      ...
      2000
      2001
      2002
      2003
      2004
      2005
      2006
      2007
      2008
      2009
    
  
  
    
      0
      323
      267
      224
      162
      166
      211
      217
      251
      267
      244
      ...
      23471
      24467
      25161
      26065
      27665
      29097
      30634
      31988
      32819
      32274
    
    
      1
      600
      520
      429
      321
      308
      362
      416
      462
      504
      478
      ...
      25578
      26232
      26469
      27106
      28753
      30671
      32552
      33470
      33445
      32077
    
    
      2
      310
      228
      215
      157
      157
      187
      207
      247
      256
      231
      ...
      22257
      23532
      23929
      25074
      26465
      27512
      29041
      31070
      31800
      31493
    
    
      3
      991
      887
      749
      580
      546
      603
      660
      771
      795
      771
      ...
      32275
      32750
      32900
      33801
      35663
      37463
      40169
      41943
      42377
      40902
    
    
      4
      634
      578
      471
      354
      353
      368
      444
      542
      532
      506
      ...
      32949
      34228
      33963
      34092
      35543
      37388
      39662
      41165
      41719
      40093
    
    
      5
      1024
      921
      801
      620
      583
      653
      706
      806
      860
      769
      ...
      40640
      42279
      42021
      42398
      45009
      47022
      51133
      53930
      54528
      52736
    
    
      6
      1032
      857
      775
      590
      564
      645
      701
      868
      949
      795
      ...
      31255
      32664
      33463
      34123
      35998
      37297
      39358
      40251
      40698
      40135
    
    
      7
      518
      470
      398
      319
      288
      348
      376
      450
      487
      460
      ...
      28145
      28852
      29499
      30277
      32462
      34460
      36934
      37781
      37808
      36565
    
    
      8
      347
      307
      256
      200
      204
      244
      268
      302
      313
      290
      ...
      27940
      28596
      28660
      29060
      29995
      31498
      32739
      33895
      34127
      33086
    
    
      9
      507
      503
      374
      274
      227
      403
      399
      475
      423
      426
      ...
      24180
      25124
      25485
      25912
      27846
      29003
      30954
      32168
      32322
      30987
    
    
      10
      948
      807
      671
      486
      437
      505
      573
      650
      731
      648
      ...
      32259
      32808
      33325
      34205
      35599
      36825
      39220
      41238
      42049
      40933
    
    
      11
      607
      514
      438
      310
      294
      359
      421
      481
      547
      472
      ...
      27011
      27590
      28059
      29089
      30126
      30768
      32305
      33151
      33978
      33174
    
    
      12
      581
      510
      400
      297
      253
      269
      425
      393
      523
      458
      ...
      26723
      27315
      28232
      28835
      31027
      31656
      33177
      35008
      36726
      35983
    
    
      13
      532
      467
      401
      266
      250
      287
      362
      387
      428
      383
      ...
      27816
      28979
      29067
      30109
      31181
      32367
      34934
      36546
      37983
      37036
    
    
      14
      393
      325
      291
      211
      205
      233
      265
      294
      341
      297
      ...
      24294
      24816
      25297
      25777
      26891
      27881
      29392
      30443
      31302
      31250
    
    
      15
      414
      355
      318
      241
      227
      265
      290
      330
      353
      348
      ...
      23334
      25116
      25683
      26434
      27776
      29785
      33438
      34986
      35730
      35151
    
    
      16
      601
      576
      491
      377
      371
      416
      430
      506
      510
      471
      ...
      25623
      27068
      27731
      28727
      30201
      30721
      32340
      33620
      34906
      35268
    
    
      17
      768
      712
      638
      512
      466
      523
      548
      618
      665
      633
      ...
      33872
      35430
      36293
      37309
      39651
      41555
      43990
      45827
      47040
      47159
    
    
      18
      906
      836
      759
      613
      559
      609
      643
      714
      732
      672
      ...
      37992
      39247
      39238
      39869
      41792
      43520
      46893
      49361
      50607
      49590
    
    
      19
      790
      657
      540
      394
      347
      453
      530
      619
      685
      572
      ...
      29612
      30196
      30410
      31446
      31890
      32516
      33452
      34441
      35215
      34280
    
    
      20
      599
      552
      457
      363
      308
      358
      451
      472
      540
      494
      ...
      32101
      32835
      33553
      34744
      36505
      37400
      39367
      41059
      42299
      40920
    
    
      21
      286
      202
      175
      127
      131
      174
      177
      229
      224
      201
      ...
      20993
      22222
      22540
      23365
      24501
      26120
      27276
      28772
      29591
      29318
    
    
      22
      621
      561
      491
      365
      334
      367
      420
      466
      508
      475
      ...
      27445
      28156
      28771
      29702
      30847
      31644
      33354
      34558
      35775
      35106
    
    
      23
      592
      501
      382
      339
      298
      364
      476
      475
      512
      517
      ...
      22569
      24342
      24699
      25963
      27517
      28987
      30942
      32625
      33293
      32699
    
    
      24
      596
      521
      413
      307
      275
      259
      409
      396
      415
      405
      ...
      27829
      29098
      29499
      31262
      32371
      33395
      34753
      36880
      38128
      37057
    
    
      25
      868
      833
      652
      550
      495
      546
      658
      843
      762
      780
      ...
      30529
      30718
      30849
      32182
      34757
      37555
      38652
      40326
      40332
      38009
    
    
      26
      686
      647
      558
      427
      416
      476
      498
      537
      565
      533
      ...
      33332
      33940
      34335
      34892
      36758
      37536
      39997
      41720
      42461
      41882
    
    
      27
      918
      847
      736
      587
      523
      573
      625
      709
      747
      697
      ...
      36983
      37959
      38240
      38768
      40603
      42142
      45668
      48172
      49233
      48123
    
    
      28
      410
      334
      289
      208
      211
      247
      292
      343
      362
      338
      ...
      22203
      24193
      24446
      25128
      26606
      28180
      29778
      31320
      32585
      32197
    
    
      29
      1152
      1035
      881
      676
      626
      680
      722
      808
      838
      789
      ...
      34547
      35371
      35332
      36077
      38312
      40592
      43892
      47514
      48692
      46844
    
    
      30
      332
      292
      248
      187
      208
      253
      271
      297
      324
      295
      ...
      27194
      27650
      27726
      28208
      29769
      31209
      32692
      33966
      34340
      33564
    
    
      31
      382
      311
      187
      176
      146
      180
      272
      234
      326
      282
      ...
      25068
      26118
      26770
      29109
      29676
      31644
      32856
      35882
      39009
      38672
    
    
      32
      771
      661
      563
      400
      385
      455
      516
      593
      648
      561
      ...
      28400
      28966
      29522
      30345
      31240
      32097
      33643
      34814
      35521
      35018
    
    
      33
      455
      368
      301
      216
      222
      252
      298
      321
      376
      346
      ...
      23517
      25059
      25059
      25719
      27516
      29122
      31753
      32781
      34378
      33708
    
    
      34
      668
      607
      505
      379
      358
      439
      458
      548
      556
      531
      ...
      28350
      28866
      29387
      30172
      31217
      32108
      34212
      35279
      35899
      35210
    
    
      35
      772
      712
      600
      449
      417
      482
      517
      601
      636
      563
      ...
      29539
      30085
      30840
      31709
      33069
      34131
      36375
      38003
      39008
      38827
    
    
      36
      874
      788
      711
      575
      559
      600
      645
      711
      731
      672
      ...
      29685
      31378
      32374
      33690
      35318
      36461
      38610
      40421
      41542
      41283
    
    
      37
      271
      243
      205
      159
      175
      211
      229
      258
      273
      250
      ...
      24321
      24871
      25279
      25875
      27057
      28337
      29990
      30958
      31510
      30835
    
    
      38
      426
      366
      241
      189
      129
      184
      309
      244
      323
      320
      ...
      26115
      27531
      27727
      30072
      31765
      32726
      33320
      35998
      38188
      36499
    
    
      39
      378
      325
      277
      198
      204
      245
      264
      304
      334
      300
      ...
      26239
      27059
      27647
      28501
      29734
      30764
      32314
      33578
      34243
      33512
    
    
      40
      479
      412
      348
      266
      257
      294
      326
      372
      418
      404
      ...
      27871
      28519
      28295
      28929
      30392
      32448
      34489
      36020
      36969
      35674
    
    
      41
      551
      498
      369
      305
      298
      310
      389
      463
      444
      444
      ...
      23907
      24899
      25010
      25192
      26169
      27905
      29582
      31009
      31253
      30107
    
    
      42
      634
      576
      474
      365
      338
      383
      414
      471
      485
      457
      ...
      26901
      28140
      28651
      29609
      31240
      31920
      34394
      36018
      36940
      36752
    
    
      43
      434
      384
      370
      284
      285
      320
      350
      390
      423
      390
      ...
      31162
      32747
      33235
      34451
      36285
      38304
      40644
      42506
      43409
      43211
    
    
      44
      741
      658
      534
      402
      376
      443
      490
      569
      599
      582
      ...
      31528
      32053
      32206
      32934
      34984
      35738
      38477
      40782
      41588
      40619
    
    
      45
      460
      408
      356
      257
      259
      313
      337
      390
      418
      370
      ...
      21915
      23333
      24103
      24626
      25484
      26374
      28379
      29769
      31265
      31843
    
    
      46
      673
      588
      469
      362
      333
      380
      461
      518
      551
      507
      ...
      28232
      29161
      29838
      30657
      31703
      32625
      34535
      35839
      36594
      35676
    
    
      47
      675
      585
      476
      374
      371
      411
      496
      551
      607
      561
      ...
      27230
      29122
      29828
      31544
      33721
      36683
      41548
      43453
      45177
      42504
    
  

48 rows × 81 columns

Now, plot will iterate over an array or a list of lists passed to it, and interpret each row of the array as its own line. So, if we wanted to plot the changes in per-capita income over time, we would need to plot each column above, not each row.

Fortunately, we can simply transpose the values matrix and get what we need:



In [18]:

    
plt.plot(columns, data[columns].values.T)
plt.title('Raw Per Capita Income, 1929-2009')
plt.xlabel('Year')
plt.ylabel('Constant dollars per person')
plt.show()

If we wanted to normalize the data, we could do this in an array-wise fashion.



In [19]:

    
centered_pci = data[columns].values - data[columns].values.mean(axis=0)
normalized_pci = centered_pci / centered_pci.var(axis=0)**.5



In [20]:

    
plt.plot(columns, normalized_pci.T)
plt.title('Normalized Per Capita Income change, 1929-2009')
plt.xlabel('Year')
plt.ylabel('Dollars/person normalized by year')
plt.show()

Again, since matplotlib interprets the first list as x-coordinates and the second as y-coordinates, we can make scatterplots very quickly.

For instance, we can make a Moran scatterplot, a common spatial dependence diagnostic plot, from this data. First, we need to grab the last year's income data:



In [21]:

    
last = data['2009'].values

Then, we can use the pysal.lag_spatial function, along with our row-standardized weights, to construct the spatial lag of the 2009 per capita income:



In [22]:

    
W.transform = 'r'



In [23]:

    
Wlast = ps.lag_spatial(W, last)

Since this is a scatterplot, we don't want to use the default drawing behavior, which connects all points plotted with a line.

Matplotlib has two interfaces to change line parameters. The first uses a string to specify different plotting parameters, like color and marker. The string can contain a color and a marker style in any order.

For example, the following string, '.k' plots each point using small black dots:



In [24]:

    
plt.plot(last, Wlast, '.k')









    Out[24]:





[<matplotlib.lines.Line2D at 0x7fb3f3b2bb10>]

For larger dots, you can use the o marker:



In [25]:

    
plt.plot(last, Wlast, 'ok')









    Out[25]:





[<matplotlib.lines.Line2D at 0x7fb3f37403d0>]

To add vertical, horizontal, or sloped lines lines to the plot, you use various different line plotting functions.

For a simple scatter plot with line of best fit, you'll need to estimate a regression on the data and use the slope and intercept from that. We can do this very quickly using PySAL:



In [26]:

    
reg = ps.spreg.OLS(Wlast.reshape(-1,1), last.reshape(-1,1))
#a,b = np.polyfit(last, Wlast, 1) #will also work



In [27]:

    
a,b = reg.betas

Finally, to put it all together, we will draw on the X and Y axes as vertical and horizontal lines through the X and Y means, and will draw the line of best fit. In addition, we can annotate the plot with text, and will add the Moran's $I$, the slope of the line of best fit, to the plot:



In [28]:

    
plt.plot(last, Wlast, 'ok')
 # dashed vert at mean of the last year's PCI
plt.vlines(last.mean(), Wlast.min(), Wlast.max(), linestyle='--')
 # dashed horizontal at mean of lagged PCI
plt.hlines(Wlast.mean(), last.min(), last.max(), linestyle='--')
# red line of best fit
plt.plot(last, a + b*last, 'r')
plt.text(s='$I = %.3f$' %b, x=39000, y=41000, fontsize=18)
plt.title('Moran Scatterplot')
plt.ylabel('Spatial Lag of PCI')
plt.xlabel('2009 PCI')









    Out[28]:





<matplotlib.text.Text at 0x7fb3f1e5a3d0>

Seaborn

In addition to the most-commonly used plotting library in Python, matplotlib, it is good to know about the seaborn library, dedicated to making simple statistical plots.

Like matplotlib, seaborn has a deep gallery with many different types of visualizations. Seaborn's focus is on quick but pretty statistical visualizations, so it comes with many more specialized plot types than the distribution plots shown above.

In a few ways, it'll help us make better plots, and can especially help when reusing plots in various contexts.

There are too many different dedicated statistical plot types in seaborn to cover now, but we'll show how seaborn affects standard plotting in matplotlib, as well as show some of the more useful pre-baked plot types in seaborn.



In [29]:

    
import seaborn as sns









    



/home/ljw/.local/lib/python2.7/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))

First, let's re-plot the moran scatterplot from above:



In [30]:

    
plt.plot(last, Wlast, 'ok')
 # dashed vert at mean of the last year's PCI
plt.vlines(last.mean(), Wlast.min(), Wlast.max(), linestyle='--')
 # dashed horizontal at mean of lagged PCI
plt.hlines(Wlast.mean(), last.min(), last.max(), linestyle='--')
# red line of best fit
plt.plot(last, a + b*last, 'r')
plt.text(s='$I = %.3f$' %b, x=39000, y=41000, fontsize=18)
plt.title('Moran Scatterplot')
plt.ylabel('Spatial Lag of PCI')
plt.xlabel('2009 PCI')









    Out[30]:





<matplotlib.text.Text at 0x7fb3f0eb1e50>

Note that it looks quite different. seaborn, when imported, changes a few of the basic graphical parameters of matplotlib.

Once imported, the set_context function can be used to scale the content of a graph up or down, depending on the context in which it might be used.

For instance, let's look at the difference between the article context:



In [31]:

    
sns.set_context('paper')

plt.plot(last, Wlast, 'ok')
 # dashed vert at mean of the last year's PCI
plt.vlines(last.mean(), Wlast.min(), Wlast.max(), linestyle='--')
 # dashed horizontal at mean of lagged PCI
plt.hlines(Wlast.mean(), last.min(), last.max(), linestyle='--')
# red line of best fit
plt.plot(last, a + b*last, 'r')
plt.text(s='$I = %.3f$' %b, x=39000, y=41000, fontsize=18)
plt.title('Moran Scatterplot')
plt.ylabel('Spatial Lag of PCI')
plt.xlabel('2009 PCI')









    Out[31]:





<matplotlib.text.Text at 0x7fb3f0eac550>

and the talk context:



In [32]:

    
sns.set_context('talk')

plt.plot(last, Wlast, 'ok')
 # dashed vert at mean of the last year's PCI
plt.vlines(last.mean(), Wlast.min(), Wlast.max(), linestyle='--')
 # dashed horizontal at mean of lagged PCI
plt.hlines(Wlast.mean(), last.min(), last.max(), linestyle='--')
# red line of best fit
plt.plot(last, a + b*last, 'r')
plt.text(s='$I = %.3f$' %b, x=39000, y=41000, fontsize=18)
plt.title('Moran Scatterplot')
plt.ylabel('Spatial Lag of PCI')
plt.xlabel('2009 PCI')









    Out[32]:





<matplotlib.text.Text at 0x7fb3f3bac210>

text and shaping gets rescaled quite significantly.

But, seaborn can generate a ton of other plots as well. The most useful of these tends to be kdeplot and distplot.

kernel density plots

by default, kernel density plots are a common and powerful plotting technique. Seaborn makes it easy to do KDE plots in one and two dimensions:



In [33]:

    
sns.kdeplot(last)
plt.title('One-Dimensional KDE plot')
plt.xlabel('PCI')









    Out[33]:





<matplotlib.text.Text at 0x7fb3f3c2b2d0>



In [34]:

    
sns.kdeplot(last, Wlast)
plt.title('Two-Dimensional KDE plot')
plt.ylabel('Lag PCI')
plt.xlabel('PCI')









    Out[34]:





<matplotlib.text.Text at 0x7fb3f3b0bbd0>

Instead of doing a two-dimensional plot, if you would like to plot two distributions in one frame, use two separate calls to sns.kdeplot, followed by a plt.show() call.

In addition, when using more than one line, you can label each line being plotted and use a legend by providing a label argument to each line plotting function, and then calling plt.legend().



In [35]:

    
plt.title('Kernel Densities of 2009 PCI')
sns.kdeplot(last, label='2009 PCI')
sns.kdeplot(Wlast, label='Lagged 2009 PCI')
plt.legend()
plt.show()

Like always, there are many more options, so consult the documentation for more information about the flexibility of the kdeplot function.



In [36]:

    
sns.kdeplot?

Distplot

In many cases, we would like to visualize how well our data fits to a specific distributional form. We can do this with seaborn via distplot.

By default, distplot will fit a histogram under a kernel density plot:



In [37]:

    
sns.distplot(last)









    Out[37]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fb3f0d6fa50>

But, when supplied a distribution from scipy, the standard scientific python library, it will fit the histogram to the given distribution by estimating the proper parameters:



In [38]:

    
import scipy.stats as stats



In [39]:

    
sns.distplot(last, fit=stats.norm, kde=False)









    Out[39]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fb3f3955b50>



In [40]:

    
sns.distplot(last, fit=stats.gamma, kde=False)









    Out[40]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fb3f0b6f550>

Heatmap

Heatmaps are useful to visualize the structure of sparse matrices.

Here, we can visualize our spatial weights matrix for the 48 contiguous US States:



In [41]:

    
sns.heatmap(W.full()[0])
plt.xticks([])
plt.yticks([])
plt.show()

LMPlot



In [80]:

    
columbus = ps.pdio.read_files(ps.examples.get_path('columbus.shp'))
Wco = ps.queen_from_shapefile(ps.examples.get_path('columbus.shp'))
columbus['downtown'] = columbus.DISCBD < columbus.DISCBD.describe()['25%']



In [76]:

    
sns.lmplot('INC', 'HOVAL',columbus, hue='downtown')









    Out[76]:





<seaborn.axisgrid.FacetGrid at 0x7fb3eac5f1d0>

Pairgrid



In [92]:

    
Wco.transform = 'r'
columbus['lag_HOVAL'] = ps.lag_spatial(Wco,columbus['HOVAL'].values)
sns.pairplot(columbus, kind='reg', vars=['HOVAL', 'lag_HOVAL', 'CRIME'], diag_kind='kde')









    Out[92]:





<seaborn.axisgrid.PairGrid at 0x7fb3e0ac2710>

	Name	1929	1930	1931	1932	1933	1934	1935	1936	1937	...	2000	2001	2002	2003	2004	2005	2006	2007	2008	2009
STATE_FIPS
1	Alabama	323	267	224	162	166	211	217	251	267	...	23471	24467	25161	26065	27665	29097	30634	31988	32819	32274
4	Arizona	600	520	429	321	308	362	416	462	504	...	25578	26232	26469	27106	28753	30671	32552	33470	33445	32077
5	Arkansas	310	228	215	157	157	187	207	247	256	...	22257	23532	23929	25074	26465	27512	29041	31070	31800	31493
6	California	991	887	749	580	546	603	660	771	795	...	32275	32750	32900	33801	35663	37463	40169	41943	42377	40902
8	Colorado	634	578	471	354	353	368	444	542	532	...	32949	34228	33963	34092	35543	37388	39662	41165	41719	40093